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Abstract 

Failures of cooperation cause many of society’s gravest problems. It 
is well known that cooperation among many players faced with a social 
dilemma can be maintained thanks to the possibility of punishment, but 
achieving the initial state of widespread cooperation is often much more 
difficult. We show here that there exist strategies of Targeted punishment’ 
whereby a small number of punishers can shift a population of defectors 
into a state of global cooperation. The heterogeneity of players, often 
regarded as an obstacle, can in fact boost the mechanism’s effectivity. 
We conclude by outlining how the international community could use a 
strategy of this kind to combat climate change. 


1 Introduction 

When Svante Arrhenius enunciated his greenhouse law in 1896, atmospheric 
concentration of CO 2 stood at its highest in over half a million years - about 
300 ppm [U [2] . It has now surpassed 400 ppm [3] . Our continued failure to avoid 
the well-known consequences of global warming is not rooted in some technical 
impossibility, but in a lack of international cooperation HE]. It is a classic 
example of the Tragedy of the commons’, as popularised by Garrett Hardin 
through the metaphor of herdsmen with access to common pasture land: each 
can always prosper individually by adding another head of cattle to his herd, 
but eventually this leads to overgrazing and ruin for all EHTj. Other instances 
include overfishing, deforestation, and many kinds of pollution. The solution 
advocated by Hardin was “mutual coercion, mutually agreed upon”, which is 
usually taken to mean coercion by a Hobbesian central authority [8]. Some 
argue that an alternative option is to privatise the commons [9], although the 
coercion is still implicit in the assumption that property rights can be enforced 
m- Ironically perhaps, it is in local communities with access to some resource 
similar to Hardin’s common pasture land where self-organization to cooperate 
has often been documented [iiliig. And indeed, such cases usually involve 
rules, mutually agreed upon, and enforced by the possibility of some form of 
punishment m- 

In game theory, the tragedy of the commons is seen as a Nash equilibrium, 
where no rational agent cooperates despite its being the strategy which would 
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maximise collective payoff if adopted by all players [H]. Since cooperation 
is, nevertheless, pervasive in nature and society, much theoretical and empirical 
work has gone into understanding why this might be so 

The focus has been on behaviour in the face of ‘social dilemmas’ - situations 
where there is some communal benefit in choosing a cooperative strategy, but 
also a temptation to defect (not cooperate). In one-to-one games such as the 
prisoner's dilemma^ a strategy of conditional cooperation can be individually 
advantageous if the game is iterated and players are able to remember each other 
[I5l[22]. A better model for commons management, however, is the public goods 
game [23l [24] . Each player can choose how many tokens to put into a common 
pot which multiplies the total amount by some factor (greater than one and 
smaller than the number of players), and redistributes the result equally among 
all players. Many experiments with humans have shown that cooperation (i.e. 
adding to the pool) can be enhanced by allowing players to punish defectors, 
despite the punisher incurring a cost for doing so miis]. 

One aspect of social dilemmas which is not usually taken into account in 
theoretical studies is the heterogeneity of players m- even in lab experiments 
where the small number of subjects are all students, significantly different atti¬ 
tudes to cooperation are found [18]. In cases where the players are nation states, 
the differences are much larger. When it comes to tackling global warming, for 
instance, the heterogeneity in gross and per capita emissions, vulnerability to cli¬ 
mate change, dependence on fossil fuels, historical responsibility, technical and 
financial ability to adapt, and many other relevant variables is widely regarded 
as confounding the problem Si]. 

When public goods experiments are run in the lab, with the same group 
of subjects playing iteratively, cooperation tends to be high at first and grad¬ 
ually dwindle thereafter, possibly as cooperators become frustrated with the 
behaviour of defectors [18] . Allowing players to punish defectors from the start 
in such settings can discourage would-be defectors and maintain cooperation. In 
the real world, however, the problem is often not just one of maintaining cooper¬ 
ation, but of achieving it in an environment of almost ubiquitous defection. For 
instance, in a society with very little corruption, maintaining this happy state 
is relatively easy, since anyone attempting to break the rules would swiftly be 
identified and punished. In an environment of entrenched corruption, however, 
there are usually too many defectors and too few resources to change the state 
of affairs [26]. 

It is often assumed that punishment - and indeed positive incentives - must 
be seen as fair, and there is some evidence that unfair or inconsistent punish¬ 
ment fails to maintain cooperation in lab experiments [27]. But in situations 
of widespread defection, the punishing capacity of would-be punishers (usually 
a subset of cooperators), if applied equally to all defectors, can be too dilute 
to have any effect. Here we use a simple model to show how, in such situa¬ 
tions, there exist strategies of ‘targeted punishment’ which punishers can adopt 
in order to escape the tragedy of the commons and bring about universal co¬ 
operation. Far from being an obstacle, the existence of heterogeneity among 
the players contributes to the strategy’s effectivity, and may, perhaps, serve to 
assuage any feelings of unfairness. 
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2 Results 


2.1 Maintaining vs achieving cooperation 

Let US consider a set of N players faced with a social dilemma of some kind. 
At any given moment, each player can choose either to cooperate or to defect. 
Depending on the details of the situation, a player will perceive a net payoff 
associated with each strategy. Let us call the difference of these perceived 
payoffs Hi for player i. Thus, if i has all the relevant information, and is 
entirely selfish and rational, she will cooperate if > 0 and defect is <0. 
We shall consider, however, that the degree to which these assumptions hold 
can be captured by a ‘rationality’ parameter yd, in such a way that i has, at each 
time step t, a probability Pi of cooperating and a probability 1 — of defecting, 
where 

Pi = i[tanh(/31fi) + l]. (1) 

This sigmoidal form coincides with the transition probabilities for the spins in 
an Ising model, and for the neurons in a Hopfield neural network [28]. Behaviour 
is completely random if yd = 0, and becomes deterministic (perfectly rational) 
when yd ^ oo. The need to take this feature into account is suggested by work 
in evolutionary game theory and behavioural economics which has highlighted 
the importance of somewhat stochastic or bounded rationality miiniiM]. 

What form shall we choose for Hi? We are interested in situations where, in 
the absence of interaction with the rest of the population, most of the players are 
predisposed to defect. However, these predispositions can be heterogeneously 
distributed. Let us assume, with no loss of generality, that the sequence i = 
1, 2, ...A" positions the players in order of their predisposition, from most to least 
intrinsically cooperative. For simplicity, let us consider that the predisposition 
hi of player i is given by the linear expression hi = — (i — 2)/(A — 2). Thus, for 
any A, the first player is the only one with a slight tendency to cooperate, the 
second one has no inherent tendency, and each successive player has a greater 
tendency to defect than the previous one, down to the last with hvr = — 1. In 
addition to this individual effect, each player can be influenced by the others. For 
instance, let us assume that a certain number of players Up have each a capacity 
TT to punish defecting players they consider at fault, of which there are Uf. The 
total punishment befalling a defector among the n/ is then pi = irUp/nf. The 
balance of payoffs for a given player considered at fault is now 

, firry i — 2 

Hi = Pi -\- hi = IT— — — -. ( 2 ) 

^ rif N-2 ^ ^ 

(Note that rip and rif can change with time, although for clarity we refrain 
from making this explicit.) This is also the balance Hi for players who are 
cooperating but who would become at fault if they were to defect (with the 
small adjustment that, since such a player would presumably not punish herself, 
she has pi = hpjhf^ where hp and h/ are the values of rip and rif that there 
would be if this player defected). Meanwhile, for players not considered at fault 
irrespectively of their strategies, pi = 0. 

This simple model captures the features of social dilemmas required to illus¬ 
trate how targeted punishment can work, without sacrificing generality by going 
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into the details of a given game. However, the parameter tt could be adjusted 
to describe, for instance, the public goods game with specific punishment costs. 

Which players can punish, and whom should they punish? In many real 
situations, it is only possible for cooperators to punish defectors. In this case. 
Up is equal to the number of cooperators, Uc, at any given time. For now we 
shall focus on this case, although the possibility of defectors also punishing other 
defectors is discussed below0 As to who should be punished, this is in fact the 
only “rule” that the community has freedom to determine - or, more precisely, 
that those in a position to punish can determine. The simplest (and arguably 
fairest) rule would be for all defectors to be punished. In this case, Uf is equal 
to the number of defectors, Uf = N — ric. 

We run computer simulations of the situation described above for N = 200 
players (roughly the number of countries in the world) and compute the average 
proportion of cooperators, p = ric/N^ once a stationary state has been reached. 
Figured] shows this proportion on a colour scale for a range of the two param¬ 
eters, /3 (rationality), and tt (punishment). In panel (a), we see that for almost 
all parameter combinations global cooperation is obtained. However, there is an 
important detail: for these simulations, we have set the initial strategy of every 
player to ‘cooperate’. The lesson we can learn, therefore, is that in these condi¬ 
tions global cooperation can be maintained once it has been achieved. But what 
about if the initial strategies are all set to ‘defect’? In Figure db we show the 
results for this case. There is now a much smaller region of global cooperation, 
requiring significantly higher levels of punishment tt than are necessary simply 
to maintain cooperation. Interestingly, while a certain degree of rationality [3 
is needed to achieve cooperation, thereafter the minimum punishment enabling 
cooperation increases with rationality, implying that some degree of randomness 
in the selection of strategies is globally beneficial. 

What is happening here? To gain a better understanding of the phenomenon, 
we consider the fixed points of the dynamics. These are values p* such that, when 
p = p"" ^ the subsequent value to which the system naturally evolves is, again, p*. 
A fixed point can be either stable or unstable: if a small deviation from p* would 
tend to return the system to p*, it is stable; whereas it is unstable if random 
fluctuations around this point are amplified and the system driven to some other 
value of p. In Methods we analyse the fixed points and their stability, and show 
the results for three different combinations of parameters in the bottom panels 
of Figured! Figure dt corresponds to a level of punishment tt = 0.4. The lines 
show the fixed points as functions of /?, with stable fixed points plotted in red 
and unstable ones in blue. Arrows show the direction in which the system will 
tend to evolve depending on the value of p (away from unstable fixed points 
and towards stable ones). First of all, we observe that global defection (p = 0) 
is always unstable, while global cooperation (p = 1) is stable for any f3 > 0. 
There are two further fixed points, one stable and one unstable. If the system 
begins with sufficient cooperators that p is above the unstable one, it will evolve 
towards global cooperation. However, if the initial p is below this, evolution will 
be towards the other stable fixed point. This explains the difference between 
the top two panels, where global cooperation is observed when all players begin 

^ There is no reason, in principle, why defectors should not be able to retaliate by punishing 
the punishers. However, since the focus here is on situations where all players recognise 
the collective benefits of cooperation, as in the case of global warming, we shall leave this 
possibility unexplored. 
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Figure 1. (a) Stationary proportion of cooperators, p, for a range of 
rationality, /3, and punishment, tt, from Monte Carlo simulations of the model 
when all cooperators punish all defectors, and initially all N = 200 players 
cooperate, (b) As before, but now all players initially defect, (c) Fixed points 
of the dynamics against /3, when tt = 0.4; stable fixed points are depicted in 
red, unstable ones in blue, (d) As in (b), but with tt = 0.6. (e) Fixed points of 
the dynamics against tt, when [3 = 2.5. (The fixed-point analysis is described 
in Methods.) 


cooperating, but not when they start off defecting. Figure [T]i shows a situation 
of higher punishment, tt = 0.6. There are now still regions of [3 for which the 
stable fixed point at low p acts as a trap when all players begin defecting; but 
an interval has appeared in which there is an uninterrupted path from p = 0 to 
p = 1. This corresponds to the region in the top right panel where cooperation 
can be observed at this tt for intermediate values of (3. Finally, in Figure [1^ we 
set (3 = 2.5 and plot the fixed points against tt. Again we see that, for tt below 
a certain value, there is a stable fixed point at low p which acts as a trap, while 
high enough tt will ensure global cooperation irrespectively of initial conditions. 

2.2 Paths to cooperation 

As noted above, players with the ability to punish others have the freedom to 
decide whom to punish. It may seem fairest to punish all defectors equally, but 
when these are numerous this approach dissipates the total punishing capacity. 
Consider, instead, the following rule. A defecting player i is only deemed at 
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Figure 2. Diagrams illustrating the two strategies of targeted punishment 
described in the main text: (a) is the ‘single file’ strategy, and (b) is the 
‘groups’ strategy with groups of size z/ = 3 and a threshold 0 = 2/3. Players 
are arranged from most to least inherently cooperative; those currently 
cooperating are shown in red and those defecting in blue. A black arrow 
indicates a defector who is considered at fault (and therefore liable to be 
punished) according to the strategy, while a grey arrow signals a cooperator 
who would be at fault if she were defecting. 


fault at time t if the one immediately before her in the ordering, player i — 
1, cooperates at time t. This rule, which we shall refer to as the ‘single file 
strategy’, is illustrated in Figure [2^. According to this view, the number of 
players considered at fault, n/, will be smaller than the total number of defectors 
when these are in the majority, while the scenario becomes identical to the 
previous one when almost all the players cooperate. In Figure [3^1 we show the 
results for simulations in which punishers adopt this strategy. As in Figure [Ud, 
all players initially defect. The region of global cooperation is now significantly 
larger than in Figure [Dd: harmony can be achieved at much lower values of 
punishment tt, particularly if rationality [3 is high. Because at any one time 
only a very small number of defectors are deemed at fault, even a low level of 
punishment is sufficient to make them cooperate. As each new player switches 
strategy, it passes on the burden of responsibility to another one down the line, 
resulting in a cascade of defectors becoming cooperators. A secondary effect 
is that, as the ranks of cooperators grow, the total punishment they are able 
to inflict increases, although as we show in Supplementary Material this is not 
essential for the mechanism to work. 

The single file strategy allows for global cooperation to ensue from widespread 
defection in situations where this would not have been possible with equal al¬ 
location of punishment. Thus, selecting only certain players for culpability can 


6 







Figure 3. (a) As Figure [T|3 (all players initially defect), but now the ‘single 
file’ strategy is applied, (b) As in Figure [TJd, but under the ‘groups’ strategy 
with z/ = 10 and 0 = 80%. (See the main text and Figure [2] for descriptions of 
these strategies.) (c) Difference between Figure [T^ (all players initially 
cooperate) and Figure [3^1. (d) Difference between Figured^ and Figure [Sb. (e) 
Speed V = N/r^ where r is the number of time steps required to achieve global 
cooperation, for the situation in Figure [3^. (f) Speed v for the case of Figure 
[3 )d. 


provide an escape route from the tragedy of the commons. However, this is not 
necessarily the best rule to ensure such an outcome; in fact, there are regions 
at low TT and [3 where, according to Figured] global cooperation is sustainable, 
yet not achievable via this route. So consider now the following arrangement, 
which we can call the ‘groups strategy’. Players are allocated to groups of size 
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z/, such that players i = 1, ...z/ belong to the first group, i = z/ + 1, ...2v to the 
second, and so forth. A defector belonging to group m is deemed to be at fault 
at time t if and only if at least a proportion 0 of the players in group m — 1 
cooperate at time t. This strategy is illustrated in Figure [2]3. In Figure [Sb we 
show simulation results for this scenario, with 20 groups of z^ = 10 players and 
a threshold of ^ = 80%, where, as before, all players begin defecting. (To set 
the process off, players in the first group are always considered at fault if they 
defect.) An even greater region of parameter space now corresponds to global 
cooperation, leaving only very low levels of tt and p out of reach. For a given 
number of players, and set of inherent tendencies, hi, there will be optimal 
rules, or ‘targeted punishment strategies’, which come closest to ensuring global 
cooperation for any values of rationality and punishment. (Note that the single 
file strategy is an instance of the more general groups strategy when u = 1 and 
0 = 100%.) 

Figures [St and[3]i show the difference, Ap, between the maximum density 
of cooperators achievable (i.e. when all players initially cooperate), and the 
results of Figures [3^ and [3)3, respectively. About 12% of the parameter space 
shown corresponds to situations where cooperation is possible but not achievale 
via the single file strategy. For the groups strategy, however, little over 3% of 
the potential parameter space remains out of reach. Another aspect to take 
into account when comparing punishment strategies is the speed with which 
cooperation can be achieved. Figures [3^ and|3f show the quantity v = N/r for 
each rule, where r is the number of time steps required to achieve cooperation. 
In most of the parameter range, cooperation is achieved sooner with z/ = 10 
than with z/ = 1. 

As remarked above, in many real situations it is only the cooperators who are 
seen as having the ability or legitimacy to punish defectors. However, defectors 
too could, in principle, punish other defectors, even if this may be regarded as 
somewhat unfair. For instance, in many societies criminals pay value added 
tax on their purchases, thereby contributing indirectly to the penal system. In 
Supplementary Material we perform the same analysis for the case in which 
Up = N; that is, all players contribute to the punishment of defectors. The 
dynamics is qualitatively similar to the situation in which only cooperators can 
punish, the main difference being that global cooperation can, unsurprisingly, 
ensue from lower levels of punishment per player. If, on the other hand, only a 
fraction a of cooperators were to punish, the situation would be as in Figures [T] 
and [3] after rescaling n ^ an. 

The situations thus far examined involve a predisposition to cooperate, hi, 
with a specific functional form; and the punishing strategies assume that their 
precise ordering is known. In Supplementary Material we relax these constraints 
by adding a Gaussian noise to hi, and randomly switching the ordering of 25% 
of players with randomly chosen counterparts. We find that both punishing 
strategies described above are quite robust to these changes: the single file 
strategy is the most robust at high levels of both punishment and rationality, 
while the groups strategy is superior at low values of these parameters. 


3 Discussion 


Punishment has been shown to maintain cooperation in many social dilemma 
settings [181 US] , and it is generally assumed that such punishment should be 
fair [27]. However, in situations of entrenched defection, the society’s punishing 
capacity can become too dilute to have any effect if applied equally to all defec¬ 
tors. The message of this paper is that even in such situations there can exist 
strategies of ‘targeted punishment’ which allow a few initial punishers to shift 
a large number of defectors into a state of global cooperation. 

The paths to cooperation described above would seem to rely heavily on the 
possibility of punishment. Since punishing a defector presumably has some cost, 
such an act in itself constitutes a kind of cooperation. This is not necessarily a 
problem, given that humans and governments alike are wont to engage in “costly 
punishment” in a variety of settings [111123 ED- But, in any case, punishment 
is only one potential mechanism which might give rise to a term pi with the 
characteristics we have here assumed. For instance, a determining factor in 
human behaviour often seems to be the anticipation of how one’s choices might 
affect those of others [21]. We know that our recycling, voting or travelling by 
bicycle will have little impact on the world per se, but we may rationally engage 
in these activities in the hope that others will follow suite. If the rules of the 
game have been set up in such a way that our actions determine whether the 
next player in line will be expected to honour her conditional commitments, 
what seemed like a grain of sand in the desert becomes a grain of sand in an 
avalanche. If one imagines all eyes turned towards the single player whose turn 
it is to cooperate - or to the single small group of such players - it is easy to 
see how one might be more inclined to cooperate than in a world of distributed 
responsibility. 

One could argue that establishing the initial ordering would be an obstacle of 
similar magnitude to achieving cooperation directly. This may be the case when 
the players are alike in all respects. But an acknowledgement of heterogeneity 
might break this symmetry in a way acceptable to all, especially if there exist 
objective measures to establish, say, the effort each player would have to make to 
cooperate. Furthermore, a player fairly well inclined to cooperate but deterred 
by the mass of less well-predisposed companions might happily adopt an early 
position in the hope that the mechanism may bear fruit; while staunch defectors 
can leave the burden of responsibility to others by being placed further down 
the line, with the knowledge that they would only be called upon to participate 
if global cooperation were nigh. In any case, if the tool for convincing players to 
cooperate is some form of punishment, only the punishers need agree on whom 
to punish at any given time. 

It is worth reflecting that much social organization as we know it is in fact 
achieved though an implicit arrangement of targeted punishment. Even the 
most despotic tyrants cannot personally punish all dissenters. But if they can 
exert power over a small group of underlings, who in turn manage their sub¬ 
ordinates, and so on down a hierarchical pyramid, top-down control can occur. 
Similarly, most of us are subject to the judging gazes of only our immediate 
friends and neighbours, yet this can be enough to ensure conformity to various 
social conventions. 
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3.1 Targeted punishment in practice 

Could a strategy of targeted punishment be implemented by the international 
community to escape from global tragedies of the commons, such as anthro¬ 
pogenic climate change? Ideally, countries might sign up voluntarily to small 
groups, each of which would in turn be allocated a position in an ordering. Al¬ 
ternatively, the ordering and groupings could follow automatically from some 
objective measure, such as income per capita. Small nations already making 
significant yet unsung progress, or particularly vulnerable ones, could use their 
early positioning to draw attention to their situations in the hope of having a 
wider effect. Others may initially welcome the temporary lifting of responsibility 
by signing up to a group further down the line, but find themselves obliged to 
cooperate once the spotlight came their way. Finally, even the biggest polluters 
would run out of excuses once a majority of other groups were cooperating. 
Some combination of sanctions and incentives could be arranged, although the 
mere fact of the whole world’s eyes being focused on a small number of defectors 
at any one time might prove a sufficient inducement in many cases. We have 
already tried signing up to commitments, enshrining these in law, privatising 
the commons through carbon trading schemes - yet yearly global CO 2 emissions 
are now about 30% higher than when the Kyoto Protocol was adopted mg. 
Perhaps it is time for a new approach. 


4 Methods 


According to Eqs. m and m, the probability that player i will cooperate at 
time step t + 1 is 


m- 


1 ) = - tanh 
^ 2 


/3 TT 


i-2 


_ np(t) _ 

n/(t) N-2 


1 

2 ’ 


where np{t) and n/(t) are the numbers of punishing and punishable players, 
respectively, at time t. If pt = nc{t)/N is the proportion of cooperating players 
at time t, let us define the expected proportion of cooperating players at time 
t + 1: G{pt) = pt-\-i (this is an expected value in the sense that the average of 
Pt+i over many independent realizations of the system will converge to pt+i)- 
We can then write 

G{Pt) = {Pi{t + I))? 


where (•) stands for an average over all players. For the case where rip = ric and 
rip = N — Tie (all defectors are punished by, and only by, all cooperators), this 
becomes 


G{pt) 


- (tanh 

2\ 


P tt 


Pt 


i-2 


I -Pt N-2 


1 

2 ' 


(3) 


Any value p* such that G{p'') = p* will be a fixed point of the dynamics. 
Fluctuations around p* will tend to dampen out if 


dGipt) 


dpt 


€ (- 1 , 1 ), 


(4) 


whereas if the absolute value of the derivative is larger than one, the fixed point 
will be unstable, since even an infinitesimal fluctuation will drive the system to 


10 














a different state. The bottom panels of Figured] are obtained by solving Eqs. 
© and numerically. More generally, for any pt, it is possible to determine 
whether the system can be expected to evolve towards more or fewer cooperators 
by the sign of G{pt) - pt- 
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In the main text we describe two strategies of targeted punishment which 
punishers can adopt when most of the players are defecting: 

• Single file: A defector is considered at fault if and only if the player im¬ 
mediately before her in the ordering is currently cooperating. 

• Groups: A defector is considered at fault if and only if at least a fraction 0 
of the players making up the group immediately before hers in the ordering 
are currently cooperating. 

The ordering should arrange players in descending order of their net perceived 
payoff hi - that is, in increasing order of their temptation to defect. We show 
that by focusing their punishment on defectors considered at fault according to 
the rule adopted, punishers can shift a population of defectors towards global 
cooperation in many situations where attempting to punish all defectors would 
have no appreciable effect. But the generality of these results is limited by 
two assumptions: that the number of punishers is proportional to the number 
of cooperators; and that the perceived payoffs of players follow a linear form, 
which is known to punishers. In this appendix we relax both assumptions and 
find that the effectivity of targeted punishment does not depend strongly on 
such considerations. 


Constant punishment 

In the main text, we consider only scenarios in which the number of punishers 
is equal to the number of cooperators. If it were, in fact, proportional to the 
number of cooperators, this would simply involve a rescaling of the punishment 
parameter, tt. But what if all players were punishers, irrespectively of their 
individual state of cooperation? Figure 311 shows the situations corresponding to 
Figure I of the main text, with the difference that now the number of punishers 
is Up = N at all times. As in the case where only cooperators punish, there 
is a large region of parameter space where cooperation is sustainable, but not 
achievable when punishment is diluted among all defectors. 
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Figure S 1. As Figure 1 of the main text, but with the number of punishers 
being equal to the number of players: Up = N. (a) All players initially 
cooperate, (b) All players initially defect. Bottom panels: Stability diagrams 
when TT = 0.2 (c), tt = 0.3 (d), and p = 2.5 (e). 


The targeted punishment strategies described above are able to bring about 
global cooperation in large regions of the parameter space, as can be seen in 
Figure 32l “ this figure corresponds to Figure 3 of the main text, with the 
difference that here Up = N^ instead of Up = tt-c- Note that, if we wished to 
consider a situation where a fraction a of players where punishers, it would 
suffice to rescale the punishment parameter as tt ^ utt in Figures 31] and 32l 


Robustness to noise 

In the main text we consider situations where player i’s payoff, in the absence 
of punishment, is hi = — (i — 2 )/[N — 2), and players are arranged in descending 
order of h. In real situations, punishers may not know the payoff perceived by 
player i. We therefore corrupt this setting with two sources of noise to gauge 
the strategies’ robustness to this imperfect knowledge. We now consider that 
i’s payoff is hi = —{i — 2 ^ r]i)/{N — 2)^ where the variables rn are drawn from 
a Gaussian with mean zero and variance cr^. We also reshuffle the ordering, by 
choosing a fraction / of the N players randomly, and switching their positions 
in the ordering with randomly chosen players. Figure 33] shows a setting like 
that of Figure 3 of the main text, after we have corrupted the payoffs with a 
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Figure S 2. As Figure 3 of the main text, but with the number of punishers 
being equal to the number of players: Up = N. (a) As Figure (all players 
initially defect), but now the ‘single file’ strategy is applied, (b) As in Figure 
3Ilo, but under the ‘groups’ strategy with = 10 and 0 = 80%. (c) Difference 
between Figure (all players initially cooperate) and Figure 32k- (d) 
Difference between Figure 3Ik and Figure 32b- (e) Speed v = N/r, where r is 
the number of time steps required to achieve global cooperation, for the 
situation in Figure 32k- (f) Speed v for the case of Figure 32b- 

(quenched) noise set by = 1, and reshuffled a proportion / = 25% of players. 
Although the targeted punishment strategies lose some of their effectivity, there 
are still large regions of parameter space where their adoption leads to most 
players cooperating. 

In Figure 311 we carry out the same corruption of payoffs and of the order- 
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Figure S 3. As Figure 3 of the main text, after the payoffs have been 
corrupted with a noise drawn from a Gaussian of mean zero and variance 
= 1; and a random proportion / = 25% of players have had their positions 
in the ordering switched with other random players. Punishment strategies are 
‘single file’ in the panels on the left [(a),(c) and (e)], and ‘groups’ in those on 
the right [(b),(d) and (f)]. (a) and (b) Stationary proportion of cooperators, p. 
(c) and (d) Difference between maximum p maintainable [as displayed in 
Figure 1(a) of the main tex], and the results of Figures and 
repectively. (e) and (f) Speed v = A^/r, where r is the number of time steps 
required to achieve global cooperation, for the situations in Figures and 
33b, respectively. 


ing as in Figure but here we consider the situation where all players are 
punishers: Up = N. Again, the strategies can be seen to be fairly robust under 
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these conditions. 



Figure S 4. As Figure 32l (all players punish), after the payoffs have been 
corrupted with a noise drawn from a Gaussian of mean zero and variance 
= 1; and a random proportion / = 25% of players have had their positions 
in the ordering switched with other random players. Punishment strategies are 
‘single file’ in the panels on the left [(a),(c) and (e)], and ‘groups’ in those on 
the right [(b),(d) and (f)]. (a) and (b) Stationary proportion of cooperators, p. 
(c) and (d) Difference between maximum p maintainable [as displayed in 
Figure and the results of Figures and repectively. (e) and (f) 
Speed V = N/r, where r is the number of time steps required to achieve global 
cooperation, for the situations in Figures 31b and 31b , respectively. 


5 























